Subspace models for document script and language identification
Identifieur interne : 000672 ( Main/Exploration ); précédent : 000671; suivant : 000673Subspace models for document script and language identification
Auteurs : T. N. Vikram [France] ; K. Chidananda Gowda [Inde]Source :
- International Journal of Imaging Systems and Technology [ 0899-9457 ] ; 2010-06.
English descriptors
- KwdEn :
Abstract
In this article, we explore the suitability of subspace models like 2DPCA [Yang et al., IEEE Trans Pattern Anal Machine Intelligence 26 (2004), 131–137], 2DFLD [Yang et al., Pattern Recogn 38 (2005), 1125–1129], etc. for document script and language identification. They are employed to identify language and script at both paragraph and word level. Elaborate experimentation has been conducted which has revealed that they are robust enough to handle highly confusing scripts and their performance does not degrade drastically even in the presence of noise. A generic language identification has been attempted in this work, to identify languages of both Asian and European origin by considering a dataset of 20 different languages. © 2010 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 20, 140–148, 2010
Url:
DOI: 10.1002/ima.20215
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000973
- to stream Istex, to step Curation: 000962
- to stream Istex, to step Checkpoint: 000252
- to stream Main, to step Merge: 000677
- to stream Main, to step Curation: 000672
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Subspace models for document script and language identification</title>
<author><name sortKey="Vikram, T N" sort="Vikram, T N" uniqKey="Vikram T" first="T. N." last="Vikram">T. N. Vikram</name>
</author>
<author><name sortKey="Gowda, K Chidananda" sort="Gowda, K Chidananda" uniqKey="Gowda K" first="K. Chidananda" last="Gowda">K. Chidananda Gowda</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:18B9CF840974D4B8413EFE9142CB843C6F9BE104</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1002/ima.20215</idno>
<idno type="url">https://api.istex.fr/document/18B9CF840974D4B8413EFE9142CB843C6F9BE104/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000973</idno>
<idno type="wicri:Area/Istex/Curation">000962</idno>
<idno type="wicri:Area/Istex/Checkpoint">000252</idno>
<idno type="wicri:doubleKey">0899-9457:2010:Vikram T:subspace:models:for</idno>
<idno type="wicri:Area/Main/Merge">000677</idno>
<idno type="wicri:Area/Main/Curation">000672</idno>
<idno type="wicri:Area/Main/Exploration">000672</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Subspace models for document script and language identification</title>
<author><name sortKey="Vikram, T N" sort="Vikram, T N" uniqKey="Vikram T" first="T. N." last="Vikram">T. N. Vikram</name>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>GREYC, Université de Caen. 6, Boulevard du Maréchal Juin, 14050 CAEN CEDEX</wicri:regionArea>
<placeName><region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Basse-Normandie</region>
<settlement type="city">CAEN</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Gowda, K Chidananda" sort="Gowda, K Chidananda" uniqKey="Gowda K" first="K. Chidananda" last="Gowda">K. Chidananda Gowda</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>International School of Information Management, University of Mysore, 3004, “Udayaravi” 5th Main, 12th Cross V. V. Puram, Mysore, Karnataka</wicri:regionArea>
<wicri:noRegion>Karnataka</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">International Journal of Imaging Systems and Technology</title>
<title level="j" type="abbrev">Int. J. Imaging Syst. Technol.</title>
<idno type="ISSN">0899-9457</idno>
<idno type="eISSN">1098-1098</idno>
<imprint><publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2010-06">2010-06</date>
<biblScope unit="volume">20</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="140">140</biblScope>
<biblScope unit="page" to="148">148</biblScope>
</imprint>
<idno type="ISSN">0899-9457</idno>
</series>
<idno type="istex">18B9CF840974D4B8413EFE9142CB843C6F9BE104</idno>
<idno type="DOI">10.1002/ima.20215</idno>
<idno type="ArticleID">IMA20215</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0899-9457</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>2DFLD</term>
<term>2DPCA</term>
<term>OCR</term>
<term>document image processing</term>
<term>language identification</term>
<term>script identification</term>
<term>subspace models</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="fr">In this article, we explore the suitability of subspace models like 2DPCA [Yang et al., IEEE Trans Pattern Anal Machine Intelligence 26 (2004), 131–137], 2DFLD [Yang et al., Pattern Recogn 38 (2005), 1125–1129], etc. for document script and language identification. They are employed to identify language and script at both paragraph and word level. Elaborate experimentation has been conducted which has revealed that they are robust enough to handle highly confusing scripts and their performance does not degrade drastically even in the presence of noise. A generic language identification has been attempted in this work, to identify languages of both Asian and European origin by considering a dataset of 20 different languages. © 2010 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 20, 140–148, 2010</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
<li>Inde</li>
</country>
<region><li>Basse-Normandie</li>
<li>Région Normandie</li>
</region>
<settlement><li>CAEN</li>
</settlement>
</list>
<tree><country name="France"><region name="Région Normandie"><name sortKey="Vikram, T N" sort="Vikram, T N" uniqKey="Vikram T" first="T. N." last="Vikram">T. N. Vikram</name>
</region>
</country>
<country name="Inde"><noRegion><name sortKey="Gowda, K Chidananda" sort="Gowda, K Chidananda" uniqKey="Gowda K" first="K. Chidananda" last="Gowda">K. Chidananda Gowda</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000672 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000672 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:18B9CF840974D4B8413EFE9142CB843C6F9BE104 |texte= Subspace models for document script and language identification }}
This area was generated with Dilib version V0.6.32. |